Automatic Acronym Acquisition and Term Variation Management within Domain-Specific Texts

نویسندگان

  • Goran Nenadic
  • Irena Spasic
  • Sophia Ananiadou
چکیده

In this paper we present a framework for the effective management of terms and their variants that are automatically acquired from domain-specific texts. In our approach, the term variant recognition is incorporated in the automatic term retrieval process by taking into account orthographical, morphological, syntactic, lexico-semantic and pragmatic term variations. In particular, we address acronyms as a common way of introducing term variants in scientific papers. We describe a method for the automatic acquisition of newly introduced acronyms and the mapping to their ‘meanings’, i.e. the corresponding terms. The proposed three-step procedure is based on morpho-syntactic constraints that are commonly used in acronym definitions. First, acronym definitions containing an acronym and the corresponding term are retrieved. These two elements are matched in the second step by performing morphological analysis of words and combining forms constituting the term. The problems of acronym variation and acronym ambiguity are addressed in the third step by establishing classes of term variants that correspond to specific concepts. We present the results of the acronym acquisition in the domain of molecular biology: the precision of the method ranged from 94% to 99% depending on the size of the corpus used for evaluation, whilst the recall was 73%. * This research is a part of the BioPATH research project coordinated by LION BioScience (http://www.lionbioscience.com) and funded by German Ministry of Research.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Term Extraction and Mining of Term Relations from Unrestricted Texts in the Financial Domain

In this paper, we present an unsupervised hybrid textmining approach to automatic acquisition of domain relevant terms and their relations. We deploy the TFIDFbased term classification method to acquire domain relevant terms. Further, we apply two strategies in order to learn lexico-syntatic patterns which indicate paradigmatic and domain relevant syntagmatic relations between the extracted ter...

متن کامل

Natural Language Processing And Ihe Automatic Acquisition Of Knowledge: A Simulative Approach

The paper presents the general design and the f i r s t results of a research project whose long term goal is to develop and implement ALICE, an experimental system capable of augmenting i ts knowledge base by processing natural language texts. ALICE (an acronym for Automatic Learning and Inference Computerized Engine) is an attempt to model the cognitive processes that occur in humans when the...

متن کامل

Terminology-driven literature mining and knowledge acquisition in biomedicine

In this paper we describe Tagged Information Management System (TIMS), an integrated knowledge management system for the domain of molecular biology and biomedicine, in which terminology-driven literature mining, knowledge acquisition (KA), knowledge integration (KI), and XML-based knowledge retrieval are combined using tag information management and ontology inference. The system integrates au...

متن کامل

Acronyms as an Integral Part of Multi-Word Term Recognition - A Token of Appreciation

Term conflation is the process of linking together different variants of the same term. In automatic term recognition approaches, all term variants should be aggregated into a single normalized term representative, which is associated with a single domain–specific concept as a latent variable. In a previous study, we described FlexiTerm, an unsupervised method for recognition of multi–word term...

متن کامل

Recognizing Acronyms and their Definitions in Swedish Medical Texts

This paper addresses the task of recognizing acronym-definition pairs in Swedish (medical) texts as well as the compilation of a freely available sample of such manually annotated pairs. A material suitable not only for supervised learning experiments, but also as a testbed for the evaluation of the quality of future acronym-definition recognition systems. There are a number of approaches to th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002